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Convergence of sequential Quasi-Monte 
Carlo smoothing algorithms 

Mathieu Gerber* Nicolas Chopin^ 


Gerber and Chopin (2015) recently introduced Sequential quasi-Monte Carlo 


(SQMC) algorithms as an efficient way to perform filtering in state-space 
models. The basic idea is to replace random variables with low-discrepancy 
point sets, so as to obtain faster convergence than with standard particle 
filtering. Gerber and Chopin (2015) describe briefly several ways to extend 
SQMC to smoothing, but do not provide supporting theory for this extension. 
We discuss more thoroughly how smoothing may be performed within SQMC, 
and derive convergence results for the so-obtained smoothing algorithms. We 
consider in particular SQMC equivalents of forward smoothing and forward 
filtering backward sampling, which are the most well-known smoothing tech¬ 
niques. As a preliminary step, we provide a generalization of the classical 
result of Hlawka and Muck (1972) on the transformation of QMC point sets 
into low discrepancy point sets with respect to non uniform distributions. As 
a corollary of the latter, we note that we can slightly weaken the assumptions 
to prove the consistency of SQMC. 

Keywords: Hidden Markov models; Low discrepancy; Particle filtering; 
Quasi-Monte Carlo; Sequential quasi-Monte Carlo; Smoothing; State-space 
models. 


1. Introduction 

State-space models are popular tools to model real life phenomena in many fields such as 
Economics, Engineering and Neuroscience. These models are mainly used for extracting 
information about a hidden Markov process (xQt>o of interest from a set of observations 
yo :T '■= (yo, • ■ •, y t). In practice, this typically translates to the estimation of p(x^|yo : t), 
the distribution of given the data yo^, 0 < t < T (called the filtering distribution), 
and/or to p(xo : T|yo:T) (called the smoothing distribution). However, these distributions 
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are intractable in most cases, and require to be approximated in some way, the most 


popular being particle filtering (Sequential Monte Carlo). See e.g. the books of Doucet 


et al. (2001), Cappe et al. (2005) for more background on state-space models and particle 


filters. 

Recently, Gerber and Chopin (2015) introduced sequential quasi-Monte Carlo (SQMC) 
as an efficient alternative to particle filtering. Essentially, SQMC amounts to replacing 
the random variates generated by a particle filter with a QMC (low-discrepancy) point set; 
that is a set of N points that are selected so as to cover more evenly the space that random 



ised QMC) point sets, the convergence rate of SQMC (with respect to N, the number 
of simulations) is at worst Op(N _1 / 2 ), while it is op(N ~ 1//2 ) on the class of continuous 
and bounded functions. (This of course compares favourably to the Op(N~ 1 ^ 2 ) rate of 
particle filtering.) In addition, the numerical results of Gerber and Chopin (2015| show 
that SQMC dramatically outperforms particle filtering in several applications. 

One important question that remains however is how to use SQMC to obtain smoothing 
estimates that converge as IV —> +oo. Smoothing is recognised as a more difficult problem 
than filtering (Briers et al., 2010). Smoothing algorithms typically require extra steps 
on top of particle filtering (such as a backward pass), and often cost 0(N 2 ) (but some 
variants cost O(N), as discussed later). 

This paper discusses existing smoothing algorithms, explains how they may be adapted 
to SQMC, and presents convergence results for the corresponding SQMC smoothing 
algorithms. We first study forward smoothing, where trajectories are carried forward in 
the particle filter, and show that this approach leads to consistent estimates in SQMC. 
Then, we derive a SQMC version of forward filtering backward sampling (where complete 
trajectories are simulated from the positions simulated by a particle filter, see |Godsill| 


et al. 2004), and establish convergence results for the so obtained smoothing estimates. 


We also consider the marginal version of backward sampling, which usually allows for a 
more precise estimation of marginal smoothing distributions. 

The rest of this paper is organized as follows. Section [2] introduces the model and 
the notations considered in this work, and give a short description of SQMC. Section [3] 
contains some preliminary results that will be needed to study SQMC smoothing. We 
first present a new consistency result for the forward step, which has the advantage to rely 
on weaker assumptions than in Gerber and Chopin (2015), and state a result relative 
to the backward decomposition and SQMC estimation of the smoothing distribution. 
Then, we provide a generalization of the classical result of Hlawka and Muck (1972) on 
the transformation of QMC point sets into low discrepancy point sets with respect to 
non uniform distributions that is essential to the analysis of QMC smoothing algorithms. 
This section ends with some results on the conversion of discrepancies through the Hilbert 
space filling curve. In Section [I] we establish the consistency of QMC forward smoothing 
while our results on QMC forward-backward smoothing are given Section [5] In Section [6] 
a numerical study examines the performance of the QMC smoothing strategies discussed 
in this work while Section 0 concludes. 
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2. Preliminaries 

2.1. Model and related notations 


Let (xt)t>o a Markov chain, defined on a space A C (equipped with Lebesgue meas¬ 
ure), with initial distribution mo(dxo), transition kernel mt(x-t-l, dxj), t > 1 , and let 
(Gt) t >0 a sequence of (measurable) potential functions, G$ : A — > M + , Gt '■ X x X —» M + . 


As in 

Gerber and Chopin 

(2015), and most of the QMC literature, we take X = [0, l) rf , 

but see Section 3 of 

Gerber and Chopin 

(2015 

) for how to generalise our results to un- 


bounded state spaces. 


For this Feynman-Kac model (mt,Gt)t> o, let Qt and Qt be the probability measures 
on X such that, for any bounded measurable function <p : X —> R, 


®t(<P) = 7 — E 
At -1 


t-i 


QtfaO = 

At. 


Z, = E 


^(xi)G'o(xo) G s (x s _i,x s ) 

S=1 

t 

ip(x t )G 0 (x 0 ) G s (x s _i,x s ) 

S=1 
t 

G'o(xo) G s (x g _i,x s ) 


S=1 


where expectations are with respect to the law of Markov chain (x*), and empty products 
equal one. Similarly, let Qt be the probability measure on X t+l such that, for any 
bounded test function (p : X t+l —> R, 


®t(<p) = TrE 
Ait 


t 

<p(x 0 : t )Go(x 0 ) G s (x s _i,x s ) 

s =1 


In the sequel, the notation 0 : t is used to denote the set of integers {0,..., t} and xo : t 
denotes the collection {x s }* =0 . Similarly, in what follows we use the shorthand x 1:N for 
a collection {x”}^^ of N points in R rf , and xj:^ for collection {xg. t }()L 1 of N points in 
jj(t+i)d. Finally, for a probability measure ir £ V(X), with V(X) the set of probability 
measures on X absolutely continuous with respect to the Lebesgue measure, n((p) denotes 
the expectation of <p(x) under tt. 

To make more transparent the connection between this Feynman-Kac formulation and 
state-space models, assume Markov chain (x*) is observed indirectly through y t, which 
has conditional probability density f' 1 (yt|x() (with respect to an appropriate measure, 
typically Lebesgue). If we take G 0 (x 0 ) = f Y (y 0 |x 0 ), G t (x t -i,x t ) = f y (yt|x t ) for t > 0, 
Qt(dxt) becomes the filtering distribution (the law of xt|yo : t), Q t (dx() the predictive 
distribution (the law of X(|yo : t_i), and Qf(dxo : t), the object of interest in this work, 
namely the smoothing distribution (the law of xo : t|yo : t)- In addition, Zf is the marginal 
likelihood of observations yo-.t- In this case, Gt depends only on xt, but having a Gt that 
depends on both xt-i and xt makes it possible to apply our results to a more general 
class of algorithms (such as those where the Markov transition used to simulate particles 
differs from the Markov transition of the model). 
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2.2. Extreme norm and QMC point sets 


As in Gerber and Chopin (2015), our consistency results are stated in term of the extreme 
norm, defined, for two probability measures 7Ti and 7T2 on [0, l) rf , by 

II 7Tl — 7T2 ||e = sup \iri(B) — 1 T 2 (B)\ 


BeB 




where 


B 


[o,i)‘ 


= {B = JJ[a,, bi], 0 < ai < bi < 1}. 


2—1 


Note that \\ttn — 7t||e —> 0 implies that 7 Tpj(ip) —>• vr(c^) for any bounded and continuous 
function ip (by portmanteau lemma, see e.g. Lemma 2.2, p.6 of Van der Vaart 2007). 
In words, consistency for the extreme norm implies consistency of estimates for test 
functions <p that are bounded and continuous. 

The extreme norm is natural in QMC contexts since it can be viewed as the general¬ 
ization of the extreme discrepancy of a point set u 1:Ar in [0, l) d , defined by 


D(u ) = ||S(u ) — Arf|| E 

where denotes the Lebesgue measure on W* and S is the operator 

N 


S : u 


1 :N 




72=1 


The extreme discrepancy therefore measures how a point set spreads evenly over [0, l) c 


and is used to define formally QMC point sets. To be more specific, 


u 


1:N 


is a QMC 


point set in [0, l) d if D(u 1:N ) = 0(N~ 1 (logN) d ). Note that, for a sample u 1:JV of N 
IID uniform random numbers in [0, l) d , D(u 1:N ) = 0{N~ l t 2 log log N) almost surely 
by the law of iterated logarithm (see e.g. Niederreiter| 1992| page 167 ). There exist 
many constructions of QMC point sets in the literature (see |Niederreiter| 1992 Dick and 
Pillichshammer 2010, for more details on this topic) and, although we write u 1:A/ rather 


than u N ' 1:N , 


u 


:N 


may not necessarily be the N first points of a fixed sequence, i.e. one 


may have u^’^ 1 ^ u N—i,N-i^ jq Q wever, it is worth keeping in mind that all the results 
presented in this paper hold both for point sets and sequences. 

Even if in this work we are mainly interested in consistency results (which hold for 
deterministic point sets u 1:JV ), we will sometimes refer to randomized QMC (RQMC) 
point sets. Formally, u 1:Ar is RQMC point set if it is a QMC point set with probability 
one and if, marginally, u n ~ U{[ 0, l) d ) for all n E 1 : IV. 


2.3. The Hilbert space-filling curve 

The Hilbert space filling curve plays a key role in the construction and the analysis of 
SQMC. This curve is a Holder continuous fractal map H : [0,1] —> [0, l] d that fills com¬ 
pletely [0, l] rf ; see Figure[l]for a graphical depiction, and Appendix | a| for a presentation of 
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m= 2 
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Figure 1: First four iterates of sequence H m , the limit of which is the Hilbert curve Ff, 
for d = 2 (source: He and Owen (2014)) 


its mains properties. In what follows, we denote by h : [0, \] d —> [0,1] its pseudo-inverse 
which verifies, for any x G [0, l] d , H o h(x) = x, and, for d = 1, we use the natural 
convention that H and h are the identity mappings, i.e. H(x) = h{x) = x , Vx G [0,1]. 
The Hilbert curve is not uniquely defined; in this work, we assume that H is such that 
H( 0) = 0 G [0, l] rf (this is in fact the classical way to construct the Hilbert curve, see 
e.g. Hamilton and Rau-Chaplin 2008). This technical assumption is needed in order to 
be consistent with the fact that we work with left-closed and right-opened hypercubes 
since, in that case, h([0, l) d ) = [0,1). 


2.4. Rosenblatt transform 

Another important technical tool for SQMC is the Rosenblatt transform. For a probab¬ 
ility distribution it over [0,1), F n denotes its CDF (cumulative distribution), and F 1 ” 1 
its inverse CDF; i.e. F~ l = inf{x G [0,1) : F{x) > «}. More generally, for a probability 
distribution it over X = [0, l) d , F n denotes the Rosenblatt transform, that is 

^V(x) = (i ? 7ri i(xi),F 7ri2 (x 2 |xi),...,F 7ri(i (x rf |xi :rf _i)) r , x = (xi,... ,x d ) T G X, 

where F n i is the CDF of the marginal distribution of the first component (relative to 7r), 
and for i > 2, FV,i( - |®i:i-i) is the CDF of component Xj, conditional on (xi,.. ., Xj_i), 
again relative to it. The inverse of F v is denoted F~ l . Note how the Rosenblatt transform 
and its inverse define a monotonous map that transforms any distribution into a uniform 
distribution, and vice-versa. 

We overload this notation for Markov kernels: i, •) is the the Rosenblatt trans¬ 

form of probability distribution mt(xt_i,dxt) (for fixed x^_i G X), and FT 1 is defined 
similarly. 


2.5. Sequential quasi-Monte Carlo 

The basic structure of SMC (Sequential Monte Carlo, also known as particle filtering) 
algorithms is recalled as Algorithm [T] One sees from this description that SMC is a 
class of iterative algorithms that use resampling and mutation steps to move from a 
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Algorithm 1 SMC Algorithm 


Generate (for n £ 1 : N) Xq ~ mo(dxo) 

Compute (for n E 1 : N) Wg* = G 0 (xg)/Em=i GoW) 
for f = 1 to t = T do 

Generate (for n £ 1 : N) u™ ~ ZY[0,1) and set a” = where F t ^(rn) = 

En=iW t n t(n<m) 

a n 

Generate (for n £ 1 : A) x™ ~ mt(x^ l _ 1 , dxQ, where x( ! _ 1 = x i ^_ 1 
Compute (for n E 1 : N) W" = G,^, x?)/£^ = i x?*) 

end for 


discrete approximation Q^(dxt) of Qt(dxt) to a discrete approximation Q(\ 1 (dxt_|_i) of 
Q t+ i(dx t+ i), where 


N 


Qf(dxt) = X! ^"<5*? ( dx *)> 


77r— 1 


A closer look at Algorithm [l] shows that, for t > 1, the resampling and the mutation 
steps together amounts to sampling from the (random) distribution on X 2 defined by 


7r( v (d(x i _i,x t )) = Qt_ x <g> mt(d(x t _i,xt)) 


N 


(1) 


where, for a probability measure ir £ P([0, l) rfl ) and a kernel /\ : [0, l) dl — > V([0, 1)), 
the notation 7r (8) iv(d(xi,X 2 )) denotes the probability measure 7r(dxi)/C(xi,dx 2 ) on 
[0,l) dl+d2 . 

Based on this observation, the basic idea of SQMC is to replace the uniform random 
numbers used at iteration t > 1 of an SMC algorithm to sample from (JT|) by a QMC point 
set u^ :JV of appropriate dimension. In the deterministic version of SQMC, the only known 
property of u| :iV is that its discrepancy converges to zero as N goes to infinity. Thus, we 
must make sure that the transform applied to u| :Ar preserves consistency (relative the 
extreme norm): i.e. D(u 1:N ) —> 0 implies that ||T^(u 1:JV ) — tt f 1 ||e —> 0, where T^ is the 
chosen transformation. 

When the state-space is univariate, Gerber and Chopin (2015) propose to use for T^ 


the inverse Rosenblatt transformation of 7r( v described in the previous subsection, which 
amounts to sample from (jlj as follows: 


Xt-1 = F ^(u t ), 


Xt = F m l(x t -I,v t ), 


( Ut,v t ) ~ A([0, if 


However, when the state variable is multivariate (i.e. d > 1) this approach cannot be 
directly used because in that case Q( v ( 1 (dxt_i) is a weighted sum of Dirac measures over 
X C M d . 

To extend this approach to multidimensional state-space models, Gerber and Chopin 
2015) transform the multivariate distribution Q( v ( 1 (dxi_i) into a univariate distribution 
h (dht-i) by applying the change of variable h : x £ X —> [0,1), where h is the 
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Algorithm 2 SQMC Algorithm 

Generate a QMC point set uj :iv in [0, l) d 
Compute (for n £ 1 : N) Xq = -F“*(uo) 

Compute (for n £ 1 : N) Wtf = G 0 (xft)/ Em=i G o(xo) 
for t = 1 to t = T do 

Generate a QMC point set u) :JV in [0, l) d+1 , let u™ = (it™, v”), where it” £ [0,1), 
v™ £ [0, l) d . Assume that, for all n,m E 1 : N, n < m => it™ < it™' 

Hilbert sort: hnd permutation (Jt-i such that *'*'’) < ... < /t(x^ l(jV '*) 

Compute (for n £ 1 : N) a™_ 1 = F^{ u t) where F t ^(m) = YliLi < m) 

Compute (for n £ 1 : N) x™ = F“ t 1 (x™_ 1 , v™), where x™_ 1 = x^ 1 

Compute (for n £ 1 : N) W t n = Gt(x™_ l5 xf)/ Ylm=i xj™) 

end for 


pseudo-inverse of the Hilbert curve (see Section 3.4). Using this change of variable, the 
resampling and mutation steps of SMC are equivalent to sampling from 


7 r t V h( d ( / h- i, x t )) = Qt-i,h ® m t ,h(d(/it—l, x t )) 


IN 


( 2 ) 


where mt xQ := As for the univariate setting, one can generate 

random variates form TT^ h (d(ht~i, Xj)) using the inverse Rosenblatt transformation of 
this distribution; that is, we can sample (ht~\ . xQ from (|2]) as follows: 

^- 1=i? SA (“*)’ x * = i mt 1 (- H '(^-i)> v t)» («t.vt) ~^([0,l) d+1 ). 


The resulting SQMC algorithm, which is therefore based for t > 1 on d + 1-dimensional 
QMC point sets u* :Ar , u™ = (ix™, v") £ [0, l) d+1 , is presented in Algorithm [ 2 ] 


3. Preliminary results 


3.1. Consistency of SQMC 

The consistency of Algorithm [2] (a s N —> + 00 , with respect to the extreme metric) was 
established in Gerber and Chopin (2015 Theorem 5), under the assumption that F mt is 
Lipschitz. We generalise below this result to the case where F mt is Holder continuous, 
as this generalisation will be needed later on when dealing with the backward step. This 
also allows us to recall some of the assumptions that will be repeated throughout the 
paper. For convenience, let F mt (x t _i,xQ = F mo (xq) when t = 0. 


Theorem 1. Consider the set-up of Algorithm where, for all t £ 0 : T, (u) :Ar )jv>i 
is a sequence of point sets in [0, \) dt , with do = d and dt = d + 1 for t > 0, such that 
D(uj :N ) -> 0 as N —> +00. Assume the following holds for all t E 0 : T: 


1. The x™ ’s are pairwise distinct: x™ 7 ^ x™ for u / m £ 1 : N; 
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2. Gt is continuous and bounded; 


3. F mt (x t _i,x t ) is such that 

a) For i E 1 : d and for a fixed x.' , the i-th coordinate of F mt (x 7 ,x) is strictly 
increasing in X{ E [0,1), the i-th coordinate of x; 

b) Viewed as a function of x' and x, T mt (x',x) zs Holder continuous; 

c) For i E 1 : d, mufx', xi-.i-i, dxt), the distribution of the component x; condi¬ 

tional on (xi, Xi-i) relative to mt(x',dx), admits a density pu(xi\xi : i-i, x 7 ) 
with respect to the Lebesgue measure such that || (• | •) 11 oo < +oo. 


4- Qi(dxt) = pt(x.t)Xd(dxt) where ptfix-t) is a strictly positive bounded density. 

For t E 1 : T, let P f N h = . Then, under Assumptions filO we have, for 

t E 1: T, 

— Qt—i,ft ® > 0, as N —> +oo 

and, for t E 0 : T, 

||Qf — Q* ||e -^ 0, as N — >■ +oo. 


The difference with Gerber and Chopin (2015, Theorem 5) is Assumption [3j where 


[3c] was not needed but it was assumed that F mt is a Lipschitz function. In this work, 
Assumption [3c] will be required to establish the validity of the backward step. Assumption 
[l] is a technical condition that is verified almost surely for the randomized version of 


SQMC while assuming that Gt is bounded is standard in particle filtering (Del Moral 
2004). In our notations, we drop the dependence of point sets on N 


i.e. 


we write 

(x 1: ^)at>i rather than (x^ 1 ^)^!, although in full generality x 1:Ar may not necessarily 
be the N first points of a fixed sequence. 

The proof of Theorem [1] is omitted since it can be directly deduced from the proof 
of Gerber and Chopin]j2015, Theorem 5) and from the generalization of the result of 


Hlawka and Muck (1972, “Satz 2”) presented in the Section 3.3, which constitutes one of 


the key ingredients to study the backward pass of SQMC. 


3.2. Backward decomposition 


Backward smoothing algorithms require that Markov kernel mt(xj_i,dxQ admits a 
(strictly positive) probability density which may be computed pointwise; mj(xj_i, dxQ = 
mt(xi_i,xt)Ad(dxt), with mt(x t _i,dxQ > 0 (and being Lebesgue measure in our 
case). 


The backward decomposition of the smoothing distribution is (e.g. Del Moral et al. 


2010 ): 


Q T (dx 0 ;T) = Qt^xt) JJ( x z, dx t _! 


( 3 ) 


t= l 
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where, for any n G V(X) and t G 1 : T, Mt,n ■ X —> V{X) is the Markov kernel such 
that 

TWt j7r (x t ,dx t _i) := Gt(x t _i,xt)7r(dx t _i) 


with 


G t {x.t- i,x t ) : = 


m t (xt i, xj) G t (xt i, xj) 


m (x t _i, x t )G t (x t _i, x t )7r(dx t _i) ' 

sv that the plug-in estimate of 
([3]) , is consistent for the extreme norm; see Appendix 


( 4 ) 


As a preliminary result, we show that the plug-in estimate Qlf of Qt, obtained by 
replacing Qt with Q^ in 
proof. 


B.l 


for a 


Theorem 2. Consider the set-up of Algorithm^ define for t G 1 : T 

t 

Qf(dx 0:t ) = Qf (dxj) -A^s^^Xsjdxs-i), (5) 

S=1 


and consider the following hypotheses: 

(HI) Gt is continuous and bounded, HGfljoc < 00; 

(H2) F MtQt i (xt,Xi_ 1) satisfies Assumptions 3a and 3b of Theorem jlj (i.e. replace mt 
by Xit,Q t -i in these assumptions). 

Then, 

1. Under (HI) and the assumptions of Theorem ^! J one has (fort G 1 : T) 

sup II-A/Lqat (xj, dx 4 _i) — ( x t, dx t _i)|| E —>• 0, as N ->• + 00 . (6) 

x t e[o,i) d ' ‘- 1 


holds, and under (H2) and the assumptions of Theorem [7J one has (for 
tel:T) 


||Qf — Qt ||e 0, as N —>• +00. 


( 7 ) 


The first result above does not have a clear interpretation, but it will be used as an 
intermediate result later on. 


3.3. A generalization of Satz 2 of Hlawka and Muck (1972) 

Theorem |3] below generalizes Proposition ‘Satz 2’ of Hlawka and Muck: (1972) to the 
case where point sets in [0, l) rf are transformed through a Holder continuous Rosenblatt 
transformation; see Appendix |B.2 for a proof. 

Theorem 3. Let ir be a probability measure on [0, l) d and assume the following: 

1. Viewed as a function of x, (x) is Holder continuous with Holder exponent k G 

(OM]; 
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2. For i € 1 : d, the i-th coordinate of F n (x) is strictly increasing in Xi € [0,1), the 
i-th coordinate of x; 

3. For i G 1 : d, 7rj(xi : j_i, dxf), the distribution of the component Xi conditional on 
(x\,Xi—i) relative to 7r(dx), admits a density Pi{xi\x\ : i—\) with respect to the 
Lebesgue measure such that ||pi(-|-)lloo < +oo. 

Let u 1:W be a point set in [0, l) d and, for n E 1 : N, define x n = i ? “ 1 (u n ). Then, for a 
constant c > 0, 

||S(x 1:JV ) — 7t|| e < cD{u l:N f/ d 

where d = Ya=o 

When the Rosenblatt transformation F n is Lipschitz, d = d and we recover the result of 
Hlawka and Muck (1972). In this case, Assumption [3] is not needed. Notice that the rate 
provided in Theorem [3] decreases quickly with the Holder exponent k. For k = 1/2, the 
convergence rate is of order 0(D( u 1 ^) 1 / 2 ' _1 ) and hence is very slow even for moderate 
values of d. 

We will see that the backward step of the forward-backward SQMC smoothing al¬ 
gorithm amounts to applying to QMC point sets transformations that are “nearly” (1 /d)- 
Holder continuous (in a sense that we will make precise). The main message of Theorem[3j 
as far as SQMC is concerned, is that such an algorithm may be consistent (as N — > +oo) 
despite being based on non-Lipschitz transformations. 

Theorem [3] is interesting more generally, since the construction of low discrepancy 
point sets with respect to non uniform distributions is an important problem, which 


is motivated by the generalized Koksma-Hlawka inequality (Aistleitner and Dick 2014 
Theorem 1): 


N 


5 >( 


x - 


71—1 


Qo,i) 


</?(x)7r(dx) 


< ^HI|5(x 1:JV ) - 7r 


where V{(f) is the variation of ip in the sense of Hardy and Krause. It is also interesting 
to mention that the inverse Rosenblatt transformation is the best known construction of 
low discrepancy point sets for non uniform probability measures, although the bounds 
for the extreme metric given in Hlawka and Muck (1972 “Satz 2”) and in Theorem [3] are 
very far from the best known achievable rate since Aistleitner and Dick ( |2013 Theorem 
1) have established the existence, for any probability measure 7r on [0, l)“/"of a sequence 
(x n ) n >i verifying ||5(x 1:Ar ) — 7t||e = 0(N~ 1 (log N)°- 5 ^ 3d+1 ' ) ). 


3.4. Discrepancy conversion through the Hilbert space filling curve 

We now state results regarding how the Hilbert curve H : [0,1] —> [0, l] d conserves 
discrepancy. Such results were not directly needed to establish the consistency of SQMC. 
Indeed, as outlined in the statement of Theorem |TJ it was sufficient to show that P f N h has 
low discrepancy with respect to the proposal distribution Qt-l,h where we recall 

that P^ h = (/i(x/Q(), x/ :JV ), with € [0,1). The discrepancy of the “resampled” 


10 























But, again, we will need such results when 


particles x^i^ in [0, l) d was not derived, 
dealing with backward estimates. 

More precisely, and as explained below (see Section 5.2), the analysis of these latter 
require results on the conversion of discrepancies through the following mapping, defined 
for k E N, by 


H k : (x 0 ,..., x fc ) € [0, l)( fc+1 ) ^ (H(x 0 ),..., H(x k )) E [0, l) d ( fc+1 ) 


( 8 ) 


and with pseudo-inverse h k : [0, l) d i fc + 1 ) -» [0, l) fc+1 . 

Theoremjdjand Corollary [l] below are generalizations of Schretter et al. (2015 Theorem 
1), which corresponds to Theorem with k = 0, TT k the uniform distribution on [0,1) 
and = S(n 1:N ) for a point set vP™ in [0,1). To save space, the proofs the these two 
results are omitted. 


Theorem 4. Let 7r(dx) = 7r(x)Ad(fc+i)(dx), k E N, be a probability measure on [0, l) d ( fc+1 ) 
with bounded density n, iTh k be the image of ir by h k and (vt^)tv>i be a sequence of prob¬ 
ability measures on [0, l) fc+1 such that \\^ kk — 7T/iJ|e —>• 0 as N —> +oo. Let n N be the 
image by H k of ir^ . Then, 

11^ — 7t||e — > 0, as N — >• +oo. 

Corollary 1. Consider the set-up of Theorem [^] with k = 0 and let K : [0, l) d — > 
P([0, l) s ) be a Markov kernel, Kh(hi,dx 2 ) = i),dx 2 ) and P^ = (/i{ ;iV ,x3, ;Ar ) be 

a sequence of point sets in [0, l) 1+s such that, as N —>• +oo, ^(P^) — iTh® P"/j||e —>• 0. 
Let P N = {H{h\ N ),^ N ). Then, 

||S(P Ar ) — ix N ® K\\e —>• 0, asN^+oo. 


A direct consequence of this corollary is that, under the assumptions of Theorem [T] 
the point set P t N = is such that, as N —> +oo, 


- Q t -1 <8> m t || E -t 0. 


Another consequence of this corollary is that Algorithm [2] can be trivially adapted to 
forward smoothing, as briefly explained in the next section. 


4. SQMC forward smoothing 

Consider now the following extension of Algorithm |2j where full trajectories z t '■= xo -t E 
X t+l are carried forward: at time 0, set z(j := Xq, and, recursively, z? : = (z ”,x”), 
with z” := z^j 1 ■ I 11 addition, replace the Hilbert sort step of Algorithm 2 by the same 
operation on full trajectories: 

Hilbert sort: find permutation at -1 such that h t ( z^ 1 ^) < ... < /i*(z ^j 1 ^) 
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with hf the inverse of a Hilbert curve that maps [0,1] into [0, l] dt . In other words, 
this is the SQMC equivalent of the smoothing technique known as ‘forward smoothing’. 

Proposition 1. Under Assumptions 1-3 of Theorem ^ IJ and Assumption f’ 

4’. Qt(dz 4 ) = p i (z t )A^( t _|_ 1 )(dz i ) where pt(%t) is a strictly positive bounded 
density; 


one has, for t > 0 and the forward smoothing algorithm described above, 

N 

£ - Q, 

71=1 

where denotes the smoothing distribution at time t. 

See Appendix |B.3| for a proof. 

This result is presented for the sake of completeness, but it is clear that it is of limited 
practical interest. Transformations through H l will lead to poor convergence rates as 
soon as t becomes large, as per Theorem [l] In addition, there is no reason to believe that 
the SQMC version of forward smoothing would not suffer from the same major drawback 
as its Monte Carlo counterpart, namely that the N simulated paths quickly coalesce to 
a single ancestor. 


0, as N —>• Too 


(9) 


5. SQMC backward smoothing 

We now turn to the derivation and analysis of a SQMC version of backward smoothing. 
There exist in fact two backward smoothing algorithms. The first one (Doucet et al. 


2000), approximates the marginal smoothing distributions (QW(dxt) for t £ 0 : T; that 
is, the marginal distribution of x* relative to Q'r(dxo : r)- This may be used to compute 
the smoothing expectation of additive functions <£>(xo : t) = YlJ=o T*( x i) such as, e.g., the 


score functions of certain models (e.g. Poyiadjis et ah, 2011). See Section 5.1 


The second type of backward step (Godsill et ah, 2004) allows to estimate the full 


(joint) smoothing distribution Q(dxo : T)- Its SQMC version is given and analysed in 
Section 15.21 

These two algorithms share the following properties: (a) they require that the Markov 
kernel mt(xt_i,dxQ admits a positive probability density mtfx-t-i^t) which may be 
computed pointwise (for all xj_i,xt 6 A); (b) they use as input the output of a forward 
pass, i.e. either Algorithm [l] (SMC), or Algorithm [2] (SQMC); and (c) their complexity 
is 0(TN 2 ). 


5.1. Marginal backward smoothing 

To perform marginal smoothing, it suffices to compute, from the output of the forward 
pass, the following smoothing weights: 


W$ T ■= W t n x 


N 

£ 


W, 


t+i 


I T m t +i{ 


> x f+l 


)G t + i(x 


t > x mJ 


E„=i Wfm t+ i(x: 


■P v m 
t ’ 1 


)G' i +i(x : 


■P -v-m ) 
t ) x t+lJ 
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for all n £ 1 : N, and recursively, from t = T — 1, to t = 0. (For t = T, simply set 
WJ} = M4p.) Then 


N 

Qi|r(d■**) : = ^2 W^ T S x n (dx t ) « Q 4 | T (dx t ). 
n =1 

This particular backward pass may be applied to either the output of SMC (Algorithm 
[TJ, or SQMC (Algorithm^. In the latter case, the question is whether this approach 
remains valid. The answer is directly given by Theorem [2] under its assumptions, we 
have that 


||Qt|T ~ O^tIIe -> 0, as N ^ +oo 

since Qt\T (resp. Q^) is a certain marginal distribution of Qt (resp. )• In words, 
marginal backward smoothing generates consistent (marginal) smoothing estimates when 
applied to the output of the SQMC algorithm. 


5.2. Full backward smoothing 

The SQMC backward step to estimate the joint smoothing distribution Qy, proposed in 

, is recalled as Algorithm [3j 


Gerber and Chopin (2015 


Algorithm 3 SQMC Backward step for full smoothing 


Input: x^ 1 '^, Wp^' N ^ for t £ 0 : T, output of Algorithm m after the Hilbert sort step 
(i.e, for all n, m G 1 : N, n < m => < /i(x^™')) and u 1:jV a point set in 

[0, l) r+1 such that, for all n, m E 1 : N, n < m ==> < u™ 

Output: Xq!^) (N trajectories in X T+1 ) 

for n = 1 —> N do 

Compute Xj- = x^ T where with Ft,jv(Q = Ylm =l (m < i) 

end for 

for t = T — 1— > 0 do 
for n = 1 —»• A do 

~n ~ 1 ~ 

Compute x™ = x t * where a™ = Ar (x^ +1 , with Fj^xj+i, i) = 


Em=i WT W ( Xt+ i)I(m < *), and WT(x t+1 ) 

end for 
end for 


ty t m m 1 + i(xf,x t+ i)Gt + i(x”,x t+ i) 
££Li M / 'fm t+ i(xj , ,x t+ i)G t+ i(xj’,x t+ i) ■ 


Algorithm [3] generates a low discrepancy point set for distribution Qjf, the plug-in 
estimate of Qy, and is therefore the exact QMC equivalent of the backward step of 


standard backward sampling. 

To better understand why Algorithm [3] is valid, it helps to decompose it in two steps. 
First, it transforms u 1:Ar , a point set in [0, 1) T+1 , into , another point set in [0, 1) T+1 , 
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by applying the inverse Rosenblatt transformation of 


( 10 ) 

which is the image of probability measure Q^(cIxo:t), defined in ([5]), by mapping Iit ■ 
(xo,... , xr) (h(x o),..., /i(xt’)). Recall that Q^ h is the image of <Q >^ by h while, for 
any n € V([0, 1)) and t € 1 : T, n : [0,1) —> V([0. 1)) is a Markov kernel such that 

Mt^(h tl dh t -i) oc m t (H(h t -i),H(ht))Gt(H(h t -i),H(ht))iT(dht-i). 

In a second step, Algorithm [ 3 ] returns xj:^) where Xg , T = with the mapping 

Ht : [0,1) T+1 —> [0, l) d ( T + 1 ) defined in (J8j). 


5.2.1. L\— and L 2 —convergence 


A direct consequence of the inverse Rosenblatt interpretation of the previous section 
is that, when Algorithm [3] uses a RQMC point set as input, the random point 


0 :T 


is such that, for any function <p : [0, l) d ( T+1 ) — y M and for any n € 1 : N, we have 
E[<^(xg. T )| J-^} = Qy ( 99 ), with J-’p 1 the cr-algebra generated by the forward step. Together 
with Theorem [2] this observation allows us to deduce ^-convergence for test functions 
p> that are continuous and bounded (see Appendix B.4 for a proof). 


Theorem 5. Consider the set-up of the SQMC forward filtering-backward smoothing 
algorithm (Algorithms^ and^Sty and assume the following: 

1. In Algorithm [^, (u^ :iV )]v>i, t £ 0 : T, are independent random sequences of point 
sets in [0, I) dt , with do = d and dt = d+ 1 for t > 0, such that, for any e > 0, there 
exists a N e ^ > 0 such that, almost surely, D(uj ' N ) < e, \/N > N e j; 

2. In Algorithm [sj (u 1:JV )jv>i is a sequence of point sets in [0,1) T+1 such that 

a) Vn £ 1 : IV, u n ~ U{[ 0,1) T+1 ); 

b) For any function ip £ L 2 ([0, l) d( - T+1 \ X dt ), Var J2n=l ¥>«)) < Ca*r{N ) 

where a ^ = f j</?(u) — f (^(v)dv} 2 3 du, r(N) — > 0 as N — > + 00 , and where 
both C and r(N) do not depend on p>; 


3. Assumptions of Theorem^ and Assumptions H1-H2 of Theorem^ hold. 
Then, for any continuous and bounded function ip : X T+1 —> M, 


E 


S(H-o'.t )(t) ~ Qt(t) — t 0, Var (5 (xq:^)(<^)) — > 0, as N — > +00. 


Assumption [l] is verified for instance when u| :iV consists of the first N points of a 
nested scrambled (f, e^)-sequence in base b > 2 (Owen 1995 1997 1998). The result 
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above may be easily extended to the case where the u^ :JV ’s are deterministic (rather than 
random) QMC point sets. 

On the other hand, the point set u 1:jV used as input of the backward pass is necessarily 
random (for the result above to hold). But u 1:Ar does not need to be a QMC point set 
(i.e. to have low discrepancy). In particular, Assumption 2 is satisfied when the u 1:JV 


for a discussion one the use of QMC or pseudo-random numbers in the backward step of 
SQMC. 


are IID uniform variates (in [0,1) T+1 ); then C = 1 and r{N) = N 1 . See Section 5.3 


5.2.2. Consistency 


Compared to standard (forward) SQMC, establishing the consistency of SQMC backward 
smoothing requires two extra technical steps. First, as Algorithm [3] generates a point set 
hgly in [0,1) T+1 using the inverse Rosenblatt transformation of the probability measure 
defined in (10), and then projects it back to X T+1 through Ht, we need to establish that 
this transformation preserves the low discrepancy properties of . For this we will use 
Theorem [I] 

Second, the proof of Gerber and Chopin (2015) for the consistency of SQMC re¬ 
quired smoothness conditions on the Rosenblatt transformation of dxQ = 

mt(H(ht~ i), dxt), so that this transformation maintains low discrepancy, as explained in 
Section |2.5[ Due to the Holder property of the Hilbert curve, the Holder continuity of 


F mt implies that F mt h is Holder continuous as well. Similarly, for the backward step we 
need assumptions on the Markov kernel which imply sufficient smoothness for 

the Rosenblatt transformation of which is used in the course of Algorithm |3j 

to transform the QMC point set in [0,1) T+1 . 

To this aim, note that since 
expect that 


i£i 


_i||e —> 0 as N —> +00 (Theorem [lj) 


one may 


II K 


} N 


-M 


0, as N ^ Too. 


Therefore, we intuitively need smoothness assumption on this limiting Markov kernel to 
establish the validity of the backward pass of SQMC. However, note that the two argu¬ 
ments of this kernel are “projections” in [0,1) through the inverse of the Hilbert curve. 
Consequently, it is not clear how smoothness assumptions on the Rosenblatt transforma¬ 
tion of Mt Qt-i would translate into some regularity for the Rosenblatt transformation of 
. As shown below, a consistency result for QMC forward-backward algorithm 
can be established under a Holder assumption on the CDF of 

To establish the consistency of Algorithm [3] we proceed in two steps. First, we consider 
a modified backward pass which amounts to sampling from a continuous distribution. 
Working with a continuous distribution allows us to focus on the technical difficulties 
specific to the backward step we just mentioned without being distracted by complicated 
discontinuity issues. Then, the result obtained for this continuous backward pass is used 
to deduce sufficient conditions for the consistency of Algorithm [3| If this approach in 
two steps greatly facilitates the analysis, the resulting conditions for the validity of QMC 


15 







forward-backward smoothing have the drawback to impose that the Markov kernel rrit 
and the potential function Gt are bounded below away from zero (see Corollary [2] below). 

5.2.3. A continuous backward pass 

Following the discussion above, we consider now a modified backward pass, which amounts 
to transforming a QMC point set u 1:iV in [0,1) T+1 through the inverse Rosenblatt trans¬ 
formation of a continuous approximation Q^ hjl of h . 

To construct Qj- h. T ■ let h be the probability measure that corresponds to a con¬ 
tinuous approximation of the CDF of which is strictly increasing on [0, /i(xy T ^)] 
with Fq N (h(xy T ^)) = 1 and such that, under the assumptions of Theorems jlj and j^J 

IIQ T,h ~ QtvJIe = °(1)- 

Next, for t G 1 : T, let K^ h : [0,1) —> V([0, 1)) be a Markov kernel such that: 

1. Its CDF is continuous on [0,1) x [0, ^(x^ 1 ^)]; 

2. Vht G [0,1), the CDF of K^ h (ht, dht-i) is strictly increasing on [0, ^(x^ 1 ^)] 
with F k n (/i t ,/i(x^ l(iV) )) = 1; 

t,h 

3. Under the assumptions of Theorems [l] and and [2] 

sup \\ K t!h( h t,dh t -i) ~ (h t ,dht-i)\\ E = o(l). 

/i t e [o,i) * 

Finally, we define Q? h T e ^([0,1) T+1 ) as 

T 

QT,h T (dho: T ) := QT,/i(d/tT) JJ K t,h( h t, dht-i) 

t= l 

which, by construction, has a Rosenblatt transformation which is continuous on [0,1) T+1 . 

Remark that such a distribution } indeed exists. For instance, under the as¬ 
sumptions of Theorems jlj and |2j one can take for Q^ h the probability distribution that 
corresponds to a piecewise linear approximation of the CDF of h and, similarly, for 
ht G [0,1), one can construct K^ h (ht,dht-\) from a piecewise linear approximation of 
the CDF of M f F ]N (h tl dh t -i). 

For this modified backward step we obtain the following consistency result: 

Theorem 6. Let (u 1:iV )Ar>i be a sequence of point sets in [0,1) T+1 such that D(u 1:N ) —>• 
0 as N —y +oo. For n G 1 : N, let h^.j, = F~* (u n ) where h is as above. Suppose 

®T,h T ’ T 

that the Assumptions of Theorem [ 7 ] and Assumptions H1-H2 of Theorem^hold and that, 
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viewed as a function of x* and x*_i, ( x ti x f-i)> CDF of ( x i? dxj_i), 

is Holder continuous for all t E 1 : T. Let Xq. t = -Ht(^o-t)- Then, 

||5(xq:t ) ~ QtIIe —* 0 as IV —» +oo. 


See Appendix |B.5.2 for a proof. 


5.2.4. A consistency result for SQMC forward-backward smoothing 

We are now ready to provide conditions which ensure that QMC forward-backward 
smoothing (Algorithms [2] and [3j yields a consistent estimate of the smoothing distri¬ 
bution. The key idea of our consistency result (Corollary [2] below) is to show that, for 
a given point set u 1:Ar , the point set Xgi^ generated by Algorithm [ 3 ] becomes, as N in¬ 
creases, arbitrary close to the point set xj:^ obtained by the modified backward step 
described in the previous subsection. 

Corollary 2. Consider the set-up of the SQMC forward filtering-backward smoothing 
algorithm (Algorithms ^ ] and[3]) and assume the following holds for t € 0 : T — 1: 

1. (u 1:7V )at>i is a sequence of point sets in [0,1) T+1 such that D( u 1:JV ) —» 0 as N — > 
+ 00 ; 

2. Assumptions of Theorem^ and Assumptions H1-H2 of Theorem^hold; 

3. Fjfftq ( x i, x t— 1 ) is Holder continuous; 

4- There exists a constant c t > 0 such that, for all E A’ 3 , 

Gt(x t _ 1 ,x t )Gt + i(x t ,x t+ 1 )m t+ 1 (x t ,x t+1 ) > c t ; 


5. Gt(xt—i,xt)mt(xt—i,xt) is uniformly continuous on A 2 . 
Then, 


l|£( x 0:r) - 


0 as N —)• + 00 . 


See Appendix B.5.3 for a proof. Recall that the result above implies that 


1 

N 


N 


T: p(xf) -> as N —> +00 


72—1 


for any bounded and continuous ip, as explained in Section 2.2 


Assumption [4] is the main assumption of this result. This strong condition is the price 
to pay for our study of QMC backward smoothing in two steps which, again, has the 
advantage to facilitate the analysis by avoiding complicated discontinuity problems. We 
conjecture that this assumption may be removed by using an approach similar to the 


proof of Theorem 4 in Gerber and Chopin (2015). 
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5.3. An alternative backward step 

A drawback of Algorithm [ 3 ] is that it uses as an input a point set of u 1;iV of dimension 
(T + 1), although T is often large in practice. It is well known that high-dimensional 
QMC point sets do not have good equidistribution properties, unless N is extremely 
large. 

To address this issue, we may still use SQMC for the forward pass, but use as a 
backward pass Algorithm [ 3 ] with IID uniform variables as an input (i.e. input u 1;jV is 
replaced by N uniforms). Our consistency results still apply, since L>(u 1:JV ) — y 0 with 
probability one in that case (Niederreiter 1992, page 167). Of course, one cannot hope 
for a convergence rate better than N~ 1 / 2 for such a hybrid approach, but the resulting 
algorithm may still perform better than standard (Monte Carlo) backward smoothing 
(for fixed N), while being simpler to implement than SQMC with a QMC backward pass 
based on a point set of dimension T + 1. 

More generally, we could take u 1:iV to be some combination of a point sets and uniform 
variables, while still having D(u 1:N ) —> 0 (Okten et ah, 2006). However, we leave for 


further research the study of such an extension. 


6. Numerical study 


We consider the following multivariate stochastic volatility model (SV) proposed by Chan 


et al. (2006): 


y t = sl / 2 e t , 

xt = fJ, + $(x t _ 1 - /x) + ^v t , 


t > 0 
t > 1 


( 11 ) 


where St = diag(exp(xti), •••, exp(x^)), and T are diagonal matrices and (ej, u t ) 
■A/"2d(02<f, C) with C a correlation matrix. 


The parameters we use for the simulations are the same as in Chan et al. (2006): 
a = 0.9, Hi = —9, 1 pa = 0.1 for all i = 1,..., d and 


C = 


0.61d + 0.41,2 0,2 

Orf 0 . 81,2 + 0 . 21,2 


where 1^, 0^ and 1^, are respectively the identity, all-zeros, and all-ones d x d matrices. 
The prior distribution for xo is the stationary distribution of the process (x 2 )j>o. We 
take d = 2 and T = 399 (i.e. 400 observations). 

We report results (a) for QMC full backward smoothing (Algorithm [ 2 ] for the forward 
pass, then Algorithm [3] for the backward pass), and (b) for marginal backward smooth¬ 
ing (as described in Section 5.1). These algorithms are compared with their Monte 


Carlo counterpart using the gain factors for the estimation of the smoothing expectation 
IE[aqt|yo:T]) t £ 0 : T, which we define as the Monte Carlo mean square error (MSE) 
over the quasi-Monte Carlo MSE. Results for component X 21 of x* are mostly similar (by 
symetry) and thus are not reported. 

The implementation of QMC and Monte Carlo algorithms are as in Gerber and Chopin 
(2015). In SQMC, prior to the Hilbert sort step, the particles are mapped into [0, l) d 
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using a component-wise (rescaled) logistic transform. For SMC, systematic resampling 
(Carpenter et al. 1999) is used, and random variables are generated using standard 


methods (i.e. not using the inverse Rosenblatt transformation). The complete C/C+ + 
code is available on-line at https://bitbucket.org/mgerber/sqmc 

Figure [ 2 ] plots the gain factors at each time step, for either N = 2 8 (left), or N = 2 10 
(right). We observe that gain factors tend to increase with N (as expected) and that they 
are above one most of the time. They are not very high for full backward smoothing; 
but note that even a marginal improvement in terms of gain factor may translate in high 
CPU time savings, given that these algorithms have complexity 0{N 2 )\ i.e. a gain factor 
of 3 means that SMC would need 3 times more particles, and therefore 9 times more 
CPU time, to reach the same accuracy as SQMC. Notice also gain factors are higher for 
marginal smoothing. 

Finally, we compare Algorithm [3 


described at the end of Section 5.3 


(full backward smoothing) with the hybrid strategy 
i.e. a SQMC forward pass (Algorithm [ 2 ]) followed 
by a Monte Carlo backward pass. Again, this is for N = 2 8 (left) and N = 2 10 (right). 
Interestingly, the hybrid strategy (slightly) dominates at most time steps (excepts those 
such that T — t is small). As already discussed, the likely reason for this phenomenon 
is that the backward pass of Algorithm [3] is based on a point set of dimension T, which 
is too large to have good equidistribution properties (for reasonable values of N), and 
therefore to bring much improvement over plain Monte Carlo. Thus, for large T, one 
may as well use this hybrid strategy to perform full smoothing. 


7. Conclusion 

The estimation of the smoothing distribution p(xo : T|yo:T) is a challenging task for QMC 
methods because it is typically a high dimensional problem. On the other hand, due to 
the 0(N 2 ) complexity of most smoothing algorithms, small gains in term of mean square 
errors translate into important savings in term of running times to reach the same level 
of error. In this work we provide asymptotic results for some QMC smoothing strategies, 
namely forward smoothing, and two variants of forward-backward smoothing. In a simu¬ 
lation study we show that the QMC forward-backward smoothing algorithm outperforms 
its Monte Carlo counterpart despite of the high dimensional nature of the problem. Also, 
if one is interested in the estimation of the marginal smoothing distributions, more im¬ 
portant gains may be obtained. 

The set of smoothing strategies discussed in this work is obviously not exhaustive. For 


instance, we have not discussed two-filter smoothing (Briers et al. 2005), or its O(N) 


variant proposed by Fearnhead et al. (2010). In fact, our analysis can be easily applied 


to derive a QMC version of these algorithms and to provide conditions for their validity. 


An other interesting smoothing algorithm is proposed in Douc et al.J (2011), where the 
backward pass is an accept-reject procedure, leading to a O(N) complexity. A last 
interesting smoothing strategy is the particle Gibbs sampler proposed by Andrieu et al. 


(2010) which generates a Markov chain having the smoothing distribution as stationary 
distribution. For these last two methods, the usefulness and the validity of replacing 
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Gain factor (log 10 scale) Gain factor (log 10 scale) 


N = 2 8 


N = 2 10 




Time step Time step 

Marginal backward smoothing 


Figure 2: Smoothing of the bivariate SV model (11) for IV = 2 8 and N = 2 10 particles. 

The graphs give the gain factor (MSE ratio, from 100 replications) for com¬ 
paring SQMC with SMC, and for E[xti|yo : r] as a function of t. The top line 
is for full backward smoothing (Algorithm [3J), the bottom line is for marginal 
backward smoothing. 


20 















































N = 2 


N = 2 8 


10 



Figure 3: Smoothing of the bivariate SV model for N = 2 s and N = 2 10 particles. 

The graphs give the gain factor (MSE ratio, for 100 replications) of the hybrid 
backward pass (Algorithm[3]with IID input) relative to the QMC backward pass 
(Algorithm [3] with a QMC point set as input), for the estimation of E[xfi|yo : r] 
as a function of t. 


pseudo-random numbers by QMC point sets remain interesting open questions. 
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A. Main properties of the Hilbert curve 

Function H is obtained as the limit of a certain sequence ( H m ) of functions H m : [0,1] —> 
[0, l] d as to —>■ oo. The proofs of the results presented in this work are based on the 

following technical properties of H and H m . For m > 0, let Z,„ = {/^(/c) } fc=0 be 
the collection of consecutive closed intervals in [0,1] of equal size 2~ md and such that 
UZ^ = [0,1]. For k > 0, Sf n (k ) = H m (I ( ) ri (k)) belongs to S(! n , the set of the 2 md closed 
hypercubes of volume 2~ md that covers [0, l] d , = [0, l] d ; S^fk) and S d ,{k + 1) are 
adjacent, i.e. have at least one edge in common ( adjacency property). If we split I^fk) 
into the 2 d successive closed intervals I d n+ [ (ki), ki = 2 d k + i and i € 0 : 2 d — 1, then 
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the S'm+i (/c?;)’s are simply the splitting of S^(k) into 2 d closed hypercubes of volume 
2 ~d(m+l) ( nes fi n g property). Finally, the limit H of H m has the bi-measure property: 
Ai(.A) = \d(H(A)), for any measurable set A C [0,1], and satisfies the Holder condition 
\\H(xi) — H(x 2 )|| < Ch\xi — X 2 \ 1 ^ d for all x\ and X 2 in [0,1]. For more background on 


space-filling curves, see Sagan (1994). 


B. Proofs 

B.l. Backward decomposition: Proof of Theorem [2] 


Lemma 2 of Gerber and Chopin (2015) is central for the proof of this result and is 


reproduced here for sake of clarity. 

Lemma 1. Let ( tt N )n>i be a sequence of probability measures on [0, l) dl such that 


N 

7 T - 7T E 


0 as N —y +00 for some ir E V([0, l) dl ), and let K a kernel [0, \) dl 


V(\f),l) d2 )) such that is Holder continuous with its i-th component strictly 

increasing in X 2 i, i E 1 : c? 2 - Then 


7r 


N 


K — 7T ® iL||E = o(l). 

From Theorem [lj we know that (for t > 1) 

ll«S(f$) - Qt-i,h ® m t , h \\E = o(l) for P t N h = . 

To establish (|6]), we fix Xf + i, and recognise Mt+ l,Q t as the marginal distribution of xj, 
relative to joint distribution 


Gt+i(xi,x t+ i)Gi ! ft(/i t _i,xt) 


t—l,h 


m t ,h{ G t,h ) 


x Qt-i,h ® m t ,h x t )) 


( 12 ) 


with = Gt(H(ht~ i,x t )). This is a change of measure applied to Qt-i,h 1 

in a 

change of measure, but applied to S(PK). 


Similarly, _A4 (+1 is the marginal of a joint distribution obtained by the same 


Thus, we may apply Theorem 1 of Gerber and Chopin (2015), and deduce that (again 
for a fixed x^ + i): 


11 t+l ,6F ( x H-i ’ dx <) - -Mi+i,Q t ( x t+i>dx t )|| E = o(l). 


To see that the o(l) term in the above expression does not depend on x^+i, note that 


in (12), the dominating measure does not depends on x^ + i, and the density with respect 


to this dominating measure is bounded uniformly with respect to xj + 1 , and therefore the 


results follows from the computations in the proof of Gerber and Chopin (2015 Theorem 
1). This shows Q for t > 1. For t = 0 replace Qt-i,h m t,h, by m o,h in the above 
argument. 
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Let us now prove the second part of the theorem. As a preliminary result to establish 
([7]) we show that, for all t > 0, 

IIQt+i ® M t+1 ^N - Qt+i ® Af t+ i,Q t || E = o(l). (13) 

Let Bf and Bt+ 1 be two sets in £>r 0 1) d and note = Bf x Bt + 1 to simplify the 

notations. Then, 


Qi+l ® _ Qt+1 ® 

Ad (xt+i^t)) Qt+i(dxt+i) - Ad (^*+ 1 , 0 * ( x m> s t)) Q t+ i(dx t+ i) 


1 Bt+i 


< 


+ 


' Bt +1 


' Bt+i 


Ad (^-M t+ 1 ,Q t (xt+i,£t)) (Qt+1 - Qt+i) (dx t+ i) 


"t+i(dxt+i) A d - A d 


By assumption, L,v( l+I Q (xt+i,xt) is Holder continuous. Since ||Q^_ 1 — 
by Theorem [lj Lemma [l] therefore implies 


= oW 


sup 


'B t . 


Ad ( ^Mt+1,01 ( x £+l ’ 


AT 

t+1 


In addition, 


(dxt+i) 


o(!). 


3 Qt+i(dx t+ i) A d (j^M t+1 ^N ( x t+i, -Bt)) - A d (^f t+llQt (xt+i> s t)) 


< 


< 


Qt+i (dxt+i) sup Ad (T!m (xt+i,l?t)j - Ad [^^(xtfi,^ 


do,i) d 


Qt+i(dx t+ i) sup \\M t+1 §jv(x t+ i,dx: t ) - A4m,Qt( x t+i,dx t ) 


t+l: 


= o(l) 


using ©>• This complete the proof of ( |13[ ). 

We are now ready to prove the second statement of the theorem. Note that Q is true 
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for t = 1 by (13). Let t > 1 and B^-t 6 ■ Then, 

\t ® (dx t _i;i) M s ^_^ (x s , dx s _i) 


' -So:i 


t -1 


' Bq a 


S=1 


t -1 


L (dXi_l:t) M Si q s _, (x s , dx. 


S=1 


< 


+ 


' Bt- 1 :, 


' Bt-i-.t 


t -1 


' B 0 ,t -2 s=l 


s— 1 J 


tv 


(8) A 4 , — Qt (S> (cl x *-l:t) 


t -1 


'”®M t Q N (dxt_i :t ) 


' Bo:t — 2 s= l 


S-1J 


,, t -1 

Bo:t—2 s= l 


The first term after the inequality sign can be rewritten as 


At+_iw I TLt-i 


'St-1 


(t—i)rf ^® t s -iA4 s ,Q s _ 1 


(x t _i,5 0 : t _2)) (Qf - Qt (8 (dx t _i :t ) 


The supremum of this quantity over i?o : t £ ^[cn^ I s °(1) us i n S (131; the fact that 
Fy.t-\ m , I s Holder continuous (because iyvf s Q j is Holder continuous for all s) and 
Lemma [Tj 

To control the second term we first prove by induction that, for any t > 1, 


sup 

So:t —2SS 


[ 0 ,l) d 


,, t -1 

t -1 

/ n^^-o- 

/ IT jM ®.Q.-i( x »> dx «- 1 ) 

•t Bo.t -2 "- 1 

" BQit —2 S=1 


= 0 (!) 

(14) 


uniformly on x/._i. By (|6]) this result is true for t = 2. Assume that (14) holds for t > 2. 
Then 




f l 1 


4 S 0 :t-l s=l 

S=1 


t-i 


< 


+ 


M t,Qf_! (xt, dx t _i) (x s , dx. 

5=1 

t-1 

Xt,Q t _!(xt, dx t _i) Ms ! q s _ 1 (x s , dx s _i) 

S—1 

„ /t -1 t -1 

! (xt, dx t _i) / M s ,Q'F, ( Xs ’ dx,s —1) - n ( Xs ’ dx ^-i) 

1 1 ■tSo :t -2 \ s=1 1 S=1 

(xt— 1 , -H 0 : t— 2 )^ (^^.(Xt.dXt-l) — ( x t, dxt-l)^ 


A(t-i)d 


dx.s-i) 
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where we saw above that second term on the right side of the inequality sign is o(l) 
uniformly on Xj while the first term is bounded by 


'[o,D d 


M.^n (xf, dxj_i) 


x sup 


[ 0 ,l) d 


„ n -1 t -1 

M S) Q s _j(x s ,dx s _i) 

\ s= \ s s= 1 


where, by the inductive hypothesis, the second factor is o(l) uniformly on X(_i E [0, l) d . 
This shows that (14) is true at time t + 1 and therefore the proof of the theorem is 
complete. 


B.2. Generalization of 

Hlawka and Miick 

(1972 

): Proof of Theorem [3] 

The proof of this result is 

an adaptation of the proof of 

Hlawka and Miick 

(l972| “Satz 


2 ”). 

In what follows, we use the shorthand ajsr(B) = S(u 1 ' N )(B) = A r_1 Yln =l ^-b{u u ) for 
any set B C [0, l) d . One has 


||«S(x 1:JV ) — 7t|| e = sup \a N (F n (B)) - X d (F n (B))\. 

B£ ®[0,l) d 


Let (3 = \k 1 ], d = Y2i =o L an arbitrary integer, and V be the partition of [0,1)“ 
in L d congruent hyper rectangles W of size L~P d 1 x L~P d x ... x L -1 . Let B E £>[ 0 i)<z, 
U\ the set of the elements of V that are strictly in F n (B), U 2 the set of elements W E V 
such that W fl d(F 7T (B)) 7 ^ 0, U\ = U Ui, U 2 = U ZY 2 , and XJ[ = F n (B) \ U\ so that 
(Hlawka and Miick, 1972, “Satz 2” or |Gerber and Chopinj [~2 015| Theorem 4) 


| a N (F n (B)) - \ d (F V (B ))| < \q.n(Ui) - A d (£7i)| + #U2 {d( u 1:Ar ) + L~ d } 


where, under the assumption of the theorem, \a.]\r(Ui ) — Ad(C/i)| < L d 1 D( u 1:7V ) (see 




Hlawka and Miick 1972). 


To bound $77 2 , we first construct a partition V' of [0,1)“ into hyperrectangles W' of 
size X ... X L ' 1 such that, for all points x and x' in W\ we have 


\Fi(xi :i -i,Xi) - -F i (a/ 1 .j_ 1 ,x')| < L ^ \ i = 


(15) 


where Fi{x\-i-i : Xi) denotes the z-th component of F 7 r (x) (with = F\(xi) 

when i = 1). To that effect, let i E 2 : d and note that 


\Fi(x 1 ;i- 1 ,x i ) - F i (x’ / 1:i _ 1 ,x')| < |Fj(xi : j_i,Xi) - Fi(x 

+ |Ti(xi : j_i,x') --F i (x' 1:i _ 1 ,a;')|. 

By Assumption |3j the probability measure 7 Tj(xi : i_i, dx*) admits a density Pi(xi\x\ : i-\) 
with respect to the Lebesgue measure such that ||pi(-|-)||oo < +00. Therefore, the hrst 
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term after the inequality sign is bounded by |jj,;|| 0 C L / /dd \ For the second term, the 
Holder property of F n implies that 


- Fi{x < C w (i 
< C n (i 


1)k/ 2 (_^ / -/3 d+1 - i )K 
1 ^/2^ L /- / 9 d + 1 -y/^ 




with C n the Holder constant of F n . For i = 1, we simply have 

|F 1 (* 1 )-F 1 (z' 1 )| < Ibrllooi'^' 1 - 


Condition (15) is therefore verihed for L' the smallest integer such that L' > CL , for 
some C > 0 . 

Remark now that d(F n (B)) = F n (d(B)) since F is a continuous function. Let R G dB 
be a (d — l)-dimensional face of B and TZ be the set of hyper-rectangles W' G V such 
that R n W' ± 0 . Note that < L ,d ~ x < ( \CL\ + l)*" 1 . For each W' G 71, take a 
point r w ' G 7? n LF 7 and define 


-VF 


= JV(r w ') G F n (R). 


Let 7 Z be the collection of hyper-rectangles W of size 2 L~P x ... x 2L _1 (assuming L 

is even) and having point f " 7 , W' G TZ, as a middle point. 

For an arbitrary u G F n (R ), let x = i ? “ 1 (u) G 77. Hence, x is in one hyperrectangle 
W' G TZ so that using (fl5 ) 


~W' i 


|u* - r-’ | = |F)(xi :i _i,Xj) - R(r^i,rf')| < L ^ 


i = 1 ,..., d. 


This shows that u belongs to the hyper rectangle W G TZ with centre r w ' so that F n (R) is 
covered by at most #TZ = ^7 Z < {\CL\ + l ) d-1 hyperrectangles W ZiTZ. To go back to 
the initial partition of [ 0 , l) rf with hyperrectangles in V, remark that every hyperrectangle 
in TZ is covered by at most ci hyperrectangles in V for a constant ci. Finally, since the 
set dB is made of the union of 2 d (d— l)-dimensional faces of 77, we have 2 < C 2 L d ~ 1 
for a constant C 2 . 

Then, we may conclude the proof as follows 

||iS(x 1:Ar ) - 7r11 E < L d ~ l D{u 1:N ) + c 2 L d ~ x (l>(u 1:JV ) + 7T d ~) 
where the optimal value of L is such that, for some C 3 > 0, 


||5(x 1:JV )-7r|| E <c 3 R(u 1:JV ) 1 ^. 


B.3. Consistency of forward smoothing: Proof of Proposition [l] 

The proof amounts to a simple adaptation of Theorem [lj by replacing Assumption 4 by 
Assumption 4’ above, one obtains that — Qt—ltf ® m t,h ||e —> 0 as N —> + 00 , 
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where P^it = (/^(zjy^), x| ;Ar ) Q t _is the image by h t of Qt_i, and mt t h is defined as 
in Theorem [T] Therefore, by Corollary [TJ 

||5(zJ :JV ) - Q t _i <g> m t ||E -t 0, as IV->•+cx>. (16) 

In addition, since the Radon-Nikodym derivative 


t -1 <K> rn t 


(d(x 0 : t-i,x t )) oc Gt(xt_i, Xt-l), 


is continuous and bounded, Theorem 1 of Gerber and Chopin (2015), together with (16), 
implies ([9|. 


B.4. L 2 -convergence: Proof of Theorem [5] 

To prove the result, let tp be as in the statement of the theorem and let us first prove the 
L i-convergence. 

We have 


E 




■1:N\ 


LV/ 


+ E 


Q-T (v 5 ) ~ Qt(<p) 


By portmanteau lemma ( Van der Vaart| 2007] Lemma 2.2, p.6), convergence in the sense 
of the extreme metric is stronger than weak convergence. Hence, the second term above 
goes to 0 as N —> +oo by Theorem [2] and by the dominated convergence theorem. For the 
first term, as each u” ~ Z7([0,1) T+1 ), we have, by the inverse Rosenblatt interpretation 
of the backward pass of SQMC, 


E 


S(x^)(^)|T t =E5(*oIf T )|J T =Q^ hT (p°HT) = Q J l(p) 


:i:JV 


N 


;jv/ 


with J-j! the a-algebra generated by the forward step (Algorithm [2]) . Therefore, 


E 


|«S(*o;t)(vO - Qt(<p )I \Ft] < Var(5(x^)(^)|T T 


1/2 


where, using Assumption |2j and the fact that Xg. T = Ht o F- 


i 1 

T,hrp 


u 


Var(s(5®(p)|^)) <Cr{N)a^ N 


(17) 


(18) 


with cr^ j y < Q^((p 2 ) and with C and r(N) as in the statement of the theorem. Let 
e > 0. Then, by Assumption [l] and looking at the proof of Theorem [2] we have for N 
large enough and almost surely, Q^(p 2 ) < Qt( < 7 >2 ) + e so that, for N large enough, 


E 




(19) 


showing the Li-convergence. To prove the L2-convergence, remark that 
E [5(x^)(^)|^] = Qt (tO = (Qr (<p) ~ Qt(tO) + @t(<p) 
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and therefore 


Var (E 




■N 


= Var 


M ~ 


< E 


't(<p) ~Qr(<p)y 


where the right-hand side converges to zero as N —> +oo by the dominated convergence 
theorem and by Theorem [2] On conclude the prove using and the fact that 


Var («S(x£;r)(<p)) = Var(^E S(*$£)&) 


+ E 


Vai\S(xk#)(tp)\ 


B.5. Consistency of the Backward step: Proof of Theorem [6] and proof of 
Corollary [2] 

B.5.1. Preliminary computations 

To prove Theorem [6] we need the following two lemmas: 

Lemma 2. Let m E N, I = [0, |^] ; k € {0,1, 2 dm — 2} and B = H(I). Then, 
B = U ? = iB{ for some closed hyperrectangles Bi C [0, l} d and where p < 2 d (m + 1). 

Proof. To prove the Lemma, let 0 < mi < m be the smallest integer rh such that 
7^(0) C I and be the number of intervals in T d ni included in I. Note that i* mi < 2 d . 
Indeed, if i* > 2 d then, by the nesting property of the Hilbert curve, 


2—1 


, -1 


L m i 


-i(0)c \J i d mi {k)<z U 


7 ^(fc) C I 


k =0 


k =0 


which is in contradiction with the definition of i* ri ] . Define I 2 = I \ Ul d ni and i* m2 the 
number of intervals in I d ln included in I 2 . For the same reason as above i* < 2 d . More 
generally, for any mi < m^ < m, i* mk < 2 d meaning that the set B is made of at most 
'Y^k=mi * m. k — 2 d {m+ 1) hypercubes (of side varying between 2~ m and 2 -mi ). □ 

Lemma 3. Let (vr Ar )jv>l be a sequence of probability measures on [0, such that 

1171^ — 7t||e — > 0 where 7r(dx) = 7r(x)A(^+i)d(dx) is a probability measure on ir( k+1 > d that 
admits a bounded density i r(x). Let iTh k be the image by h f. of it. Then, 

\\ n h k ~ TTft-fcH e -t 0, as V —> +oo. 


The proof of this last result is omitted since it follows from the properties of Cartesian 
products and from straightforward modifications of the proof of Gerber and Chopin 
(2015, Theorem 3). 


B.5.2. Proof of the Theorem |6| 

To prove the theorem first note that 

IIQ T,hr ~ = o(l). 
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Indeed, by assumption 

Theorem 3), 


Q n - 

^ T,h 


Gerber and 


'’t/JIe = o(l) and, by Theorem jlj and 
l \tTh ~~ = o( 1) since Qt admits a bounded density 

1). Hence, ||Q= o(l) and thus, by Theorem 
1 the image by H of Q^ h - In addition, using the same 


|Chopin (2015 
(Assumption ij of Theorem 
[4} ||Qy - Qt |e = o(l), wit 
argument, and using the fact that, for all t 6 1 : T, Gt is bounded (Assumption HI of 
Theorem [2j, we have, by Theorem [2] (hrst part), 


sup ||Af(x t ,dx t _i) - -Mt Q^x^dxt-i^lE = o(l), t G 1 : T 


with Kjr (xj, dx*_i) the image by H of the probability measure Kt h(H(xt), dht-±). Con¬ 
sequently, by the second part of Theorem 


image by Ht of Q 


N 

T,h T ' 


Finally, under t 


2J ||Qjf — Qt||e = o(l) where denotes the 
le assumptions of the theorem, Qt admits a 


bounded density (because for all t, Gt is bounded and Qt admits a bounded density) and 


thus, by Lemma 


\qn 
\^T,h T 


— Qt,/i t ||e = o(l). 


To prove the theorem it therefore remains to show that 

II«5(^0:t) — Qt^IIe = o(l). 

Indeed, this would yield ||«S(^o-t) — ||e = o(l) and thus, by Theorem]^] 

IISC^o-t) - QtIIe = 0(1) 


( 20 ) 


as required. 

To prove ( |20[ ), we assume to simplify the notations that Fjyi t Q (xj, x^_i) is Lipschitz. 
Generalization for any Holder exponent can be done using similar arguments as in the 
proof of Theorem [3] 

Let h™ = /(.(xf ) where xj :N are the particles obtained at the end of iteration t of 
Algorithm [2] We assume that, for all t G 0 : T, the particles are sorted according to 
their Hilbert index, i.e. n < m =>■ /i" < h” 1 (note that the inequality is strict by 
Assumption [I] of Theorem [I]). Then, using the same notation as in the proof of Theorem 
[3) one has 


\\s(K 


1:N\ 


- Q N 


T,h T I 


sup 

BgB n 

£ ’ fe£ ’[o,i) T +i 


a.N 



At+i 



where S^ 1)T+1 = {[a, 6] C H [0 i1 )t+i, i>f < /if, i € 0 : t|. 

The beginning of the proof follows the lines of Theorem [3j with (3 = d and d replaced 
by T + 1. Let d = ^Qt=o ^ so that, for a set B € 


a N (Fqn^ (B)^j - X T+ i ( F Q* hr (#)) < W") + m 2 {D(u ln ) + L" J } 


where L and U 2 are as in the proof of Theorem [3j 
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Following this latter, let V' be the partition of the set [0,1) T+1 into hyper rectangles 
W' of size L'~ dT x L'~ dT X ... X L /_1 such that, for all h and h! in W' we have 


1*6" (^*) — *6". (^i)l — * 


-d T 


and 


F 1 N _ 1 (h i - 1 ,h i )-F t N _ 1 (h’_ 1 ,h') 


< L 


-<F+ 1 - i 


i E 2 : (T + l) 


( 21 ) 


( 22 ) 


where, to simplify the notation, we write QQj (h. •) the CDF of K^_ i+2 h(h , d/iT-j+i)- 
Let us first look at condition (21). We have 


“ *Q" h (^l)l - 2 ll* 1 QTh _ f Q(fJI°° + “ ^Jloo + UW*0 ~ F Qt ^K)\ 

< 2n(N) + 2r 2 {N) + \FiQ Th (hi) — Tq^^)) 


with ri(N) = HFgjv - Fq n ll^ and r 2 (N) = ||Q ^ h - Qt+||e; note n(N) -+ 0 by the 


*T,h 


construction of Qj h and under the assumptions of the theorem while r 2 (N) —> 0 by 
Theorem [I] and by Gerber and Chopin (2015, Theorem 3) 

Let L 1 = 2 m for an integer m > 0, so that hi and h! i are in the same interval I d T -i (k) E 




i E 1 : (T+l). Then, since hi and h\ are in the same interval lir-i (k) E FL_ 1 


I FQrJh) ~ F QTh (h[)\ < Q T, h (+-.„+)) = Qt (•+->„+)) 


< IIptIIoo 

- (l'W t 


as Qt admits a bounded density pr- Hence (21) is verified if 

IIptIIoo 


L > Lkjsr, k]\r 


I - L dT rl(N)) 


i/d T 


r* 1 {N) = 2r 1 (N) + 2r 2 (N), 


which implies that we assume from now on that L~ a > 2r\{N ) for N large enough. 

Let us now look at (p2|) for a i > 1. To simplify the notation in what follows, let 


T i _ 1 (/t, •) be the CDF of . ~ N (h,d/iT-i+i) and Fi-i(h, ■) be the CDF of 

T-n-2,Q T _ i+1 h 

\A h 

l T— *+2,Qt-;+i 


(h, d/iT-i+i). Then, 


F^h^hi) - F^hUX) 

< 2\\F?_i - ^lU + 2||Tf r - Fi-iHoo + | Fi-tihi-uhi) - F^iih'^iX)] 

= 2r 3 (N) + 2 U (N) + \F^hi-i, K) - F^K-i, K)\ 

with r 3 (JV) = ||- Tl^Hoo and r 4 (N) = || F^ - T^iHoo; note r 3 (N) -+ 0 by the 
construction of K^_ j+ 2 ,h and under the assumptions of the theorem while r 4 (N) —> 0 by 
Theorem [2] and Gerber and Chopin (2015 Theorem 3). 
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To control hi) - F i - 1 (h' i _ 1 , /r')|, assume without loss of generality that hi > 

h[ and write h[) = GT-i+ 2 (H(hi-\),F[(h' i )) to simplify further the notation. 

Then 


\Fi-iihi-!, h) - F i - 1 (h , i _i, h \)| < \Fi-i(hi-i, h'i) - F i - 1 (h l i _ 1 , h[)\ 

rhi 

+ 


G (hi-i,v)Q T -i+i,h{ dv ) 


I h- 


The second term is bounded by \\G T -i+2\\ooQT-i+i,h([hi, hi]) < ||Gt-z+2||ooQt-|+i(W') 
where W G <S(/T-i m - Since Qx-i+i admits a bounded density, we have, for a constant 
c > 0 , 

\\G T -i+ 2 \\^T-i + l,h([KM) < cL~ dT+1 -\ 


To control the other term suppose first that h! % > L' dT +1 and let k be the largest 


integer such that h! % > kL 


l-d T ~ i+1 


. Then, 


\Fi-i(hi-i, h'i) — Fi-i(h'i_i, h[)\ 
rh ' 


< 


+ 


»kL' 


rh: 


lkL '~ dT ~ i+1 L 

Then, using by Lemma [ 2 j we have for the first term: 

rkL'- dT ~' l+ ' 


G h i(K- i, v ) - G h t (K-^v) \ Q T -z+i,fc( dv) 

Gi(hi-i,v) - G^h'-^v)] ® T -i+i, h ( dv) 
GHhi-i,v) ~ Gi(ti-i,v)] ® T -i+i,h(dv) 


( 23 ) 


I o 


G^hi-^v) - G^(h' i _ 1 ,v)^ ® T - i+ 1 , h (dv) 

ki 


3= 


< 


E 

3 = 1 

ki 


G T -i+ 2 (H(hi-i),x) - G T —i+ 2 (H(h(_^), x) Q T _ m (dx) 


3 = 1 


pcdf 

M T -i+ 2,Q T _ i+1 




+ 


pcdf 

MT-i+2,q T -i+i 


(»(*_!>, (m-d, bj)|} 


where VLj = [aj, bj] C [ 0 , l) d and where ki < 2 d (d T l m + 1 ). Let Ci be the Lipschitz 
constant of F c , d j . Then, for any c G [ 0 , l) d , 

■ M T-i+2,Q T _ i+1 1 J L ’ ' ’ 

Whi-l),c) - i).c)j < QUHOH-i) ~ H(hU)\\~ 


< CiL 


r-d T ~ i+1 
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because H(h{- 1) and H{h belong to the same hypercube W G 5 of side 


2 —md T * +1 _ j^t—d T 1+1 


For the second term after the inequality sign in (23), we have 


rK 



JkL’- dT ~ i+1 


QT-i+l,h{dv) 




< M T -i + 2 ,Q T _ i+lth (hi-i, [kL‘ 


r-d T ~ i+1 


> hi]) + -A / l T _ i+ 2 ! Q T _ i+1 [kL 

< XT- i+ 2,Q T _ i+1 (^(^-l), VF) 

<2||G r _, +2 || 00 || Pr _ i+1 || fV1 2- m ' iT - ,+1 

d 


r—d T ~ i+1 


KY) 


for a VF G 5 ~jr~i and where pr~i +1 is the (bounded) density of Q'r_j + i. This last 
quantity is also the bound we obtain for /i( < L'~ dT +1 . Hence, these computations 
shows that 

hi) - h')| < c i L'- dT ~ i+1 log(L') 

for a constant c*, i G 2 : (T + 1). 

Condition (22) is therefore verified when (taking L' so that log(L') > 1) 

1 

V 


> L 


max 


(jT—i +1 


log(L') ie{2,...,T+i} \1 — L dT l+1 r^(N)) y 

where r^(N) = 2rs(N) + 2r+N). Let 7 G (0,1) and note that for N large enough 
log-Z/ < L' 1 . Hence, for N large enough (21) and (22) are verified for L' the smallest 
power of 2 such that 


L' > (k N L)^~^ \ k N = 


max 


ie{i,...,T+i} \1 - L dT i+1 r*(N)) 


dT — i + 1 


ci = \\pt\\ 


where r*(N) = r*(N) + r^N). Note that we assume from now on that L d ' > 2r*(N). 
Because the function F~. N is continuous on [0, hn] x ••• x [0 ,hS], d(F~ N ( B )) = 


Fqn (9(B)) and therefore we can bound +U 2 following the proof of Theor 


• 1 , nrp 

rem pi 


Using 

the same notations as in the proof of Theorem |3j we obtain that Qi^ hT {d(B)) is covered 
by at most 


(T + l)24^Lf 


d-1 

-7 


hyperrectangles in TZ. To go back to the initial partition of [0, l) 2 "" 1 " 1 with hyper rectangles 
W G V, remark that L' > L so that every hyperrectangles in TZ is covered by at most c* 
hyperrectangles of V for a constant c*. Hence, 


#Z^ 1} <c n L^ : 


c N = c*(T+l)2 d kl~\ 


(24) 


We therefore have 

IIS(/@ - Q^ t ||e < L d D(u 1:N ) + c n L^ (d(u 1:N ) + 
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Let 7 £ (0 ,d x ) so that q := d — > 0. To conclude the proof as in Gerber and 


Chopin (2015, Theorem 4), let d\ = d T and J 2 = Thus, 

\\S(J%$) - Q^b < 2L^D(u 1:N ) + c n L ~ Cd 

_ 1 

where the optimal value of L is such that L = 0(D( u 1:JV ) c d+ d i + d 2 ). Then, provided that 


r*(N)D( u 1 '- n ) c d+ d i+h = 0(1), L verihes all the conditions above and, since cjv = 0( 1), 
we have 

II S(h%£) - Qt,/i t IIe = o (d( u 1 :Ar )^T3^ . 


Otherwise, if r*(N)D(u 1:N ) c d+ d i + d 2 —y + 00 , let L = 0(r*(N) A). Then cn = 0(1) 
and 

~ c d c d+ d l+ d 2 

L dl+ d2 D ( u 1:N ) = 0(r(lV)) 3 i dl D( u 1:N ) 

( c d+ d l+ d 2 

d\ \ 

0(r(N))~ 1 D(u 1:N )-d+ d i+ d 2 


= o [r 


Therefore || S(h^) - Q^ t ||e = o(l), which concludes the proof. 


B.5.3. Proof of the Corollary [2] 

To prove the result we first construct a probability measure Qsuch that the point 
set generated by Algorithm [ 3 ] becomes, as N increases, arbitrary close to the point 
set Xgi^ obtained using a smooth backward step described in Theorem [6] Then, we show 
that, if ||5(xg:y) - Qt||e -t 0, then||5(xj:^) - Qt||e -> 0. 

To this aims, assume that, for all t E 0 : T, the points h\' N are labelled so that 
n < m =7 h™ < h™. (Note that the inequality is strict because, by Assumption [I] of 
Theorem KU the points xj [N are distinct.) Without loss of generality, assume that hj > 0 
and let h^= 0 for all t. 

To construct Qj^ t , l e t Qt/i be suc h that Fq N is strictly increasing on [0 ,hj,] with 
F^n (hji) = Tkiv {hr) for all n E 1 : N and, for t E 1 : T, let K^ h (ht,dht-\) be such, 

y T,/i **T,h 1 ’ 

for all ht E [0,1), F K n (ht 7 *) is strictly increasing on [0, h^_-^ cind 

v t,h 

F k n (h t , h n t _ r) = F m h (h t , h n t _ r), Vn E 1 : N. 

t,n + Qj\ 

z,vl t-l,h 

Let hgiy be as in Theorem I (with Q^ hr constructed using the above choice of h 
and K^.(hi,dht~i)). We now show by a backward induction that, for any t E 0 : T, 
max nel:A r ||xf - x£||oo = o(l). 
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To see this, note that, by the construction of Qy h , 

| hj, — hj ,| < Ay, Ay := max (hST 1 — /iy| 

nGl:iV 

Lemma 2), Ay -> 0 as iV + 00 . Hence, using 
the Holder property of the Hilbert curve, this shows that max ng i : jv ||xy — Xy||oo = o(l). 

Let t G 0 : T — 1 and assume that max ng i : jv ||xj ! . 1 — xJL 1 || 0O = o(l). Let vj™ = /i(x 4 ‘), 
where a” is the index selected at iteration t of Algorithm [3] obtained by replacing x” +1 
by x" +1 . Then, by the construction of K^ h . max ng i : jv | w™ — hy| = o(l). 

We now want to show that max ng i : jv | w™ — /iy| = o( 1). To simplify the notation, 
let mt+i(xt,xt+i) = m t+ i(x t ,x t+ i)G tl (x t ,x t+ i). Then, using Assumption 0 simple 
computations show that, for m £ 1 : N, 


where, by Gerber and Chopin (2015 


\wrw +1 )-w t m w +1 )\< 


IWTm m (xr,x- +1 ) - wrm + i(xr,x? +1 )| 




x tA x t+lJ 


+ r%i(x; 


t i x i+l 


Ef=i W? fht+ii^t ,*? +1 ) - EfcLi Wt'"»t+i( x t > x m) 


(EfcLi ( x *, x m)) (EfcLi Wi fc ^t+i(x + fc , x n 


t > A t+1 


)) 


<I|G 


|hi t+ i(x: 


t 00 ' 


m ~n \ 
t 1) 


mt+iWk H ,Xt+i) 


Nc t 


+ \\Gtfht+i || c 


Y,k=i G t(^t-i^t) h+ijjjVi) -?h f+ i(x^,x^ +1 )| 

(^A) 2 


Let 

wt+i(5) = sup |f^t+i (xi, x 2 ) — rht+i ( x i, x ; 2 ) |, (5 > 0 

(xi,x 2 )eA’ 2 , (x^x^eAf 2 
||xj—xJHoo^i, i=l,2 

be the modulus of continuity of rht+i. Then, 


\W}W +1 ) - W?(x? +1 )| < max 

n£±:N 


w’m(l x m - xf +1 |oo) WGtWooiot + \\G t rh t+ i\\oo) 


N 




N 


where, using the fact that rht+i is uniformly continuous on X 2 (Assumption [ 5 ]) and the 
inductive hypothesis, $ = o(N _1 ). Also, we know that 


min inf WT(x t+1 )>ef:= 

m£l:N Xt+iEX 


Qt 

JVllGtllmt+iHoo 


Then, let N t be such that so that, for N > N t . we either have h™ = wf, or h™ = 

< +1 or ht = in " -1 . Hence, max ng i : jv | w™ — h™\ = o(l) and therefore rriax ng 1 : /v |hf — 
h ™| = o(l). Finally, by, the Holder property of the Hilbert curve, this shows that 
max ng i : jv |lx" - x^lloo = o(l). 


36 













The rest of the proof follows the lines of Niederreiter (1992 Lemma 2.5, p.15). First, 
note that the above computations shows that, for any e > 0, there exists a N e such 
that Hxgiy — XqI^Hoo < e for N > N e . Let B = [a, b], B + = [a, b + e] n [0,1) T+1 and 
B~ = [a, b — e]. If e > for at least one i E 1 : (T + 1), B~ = 0. Then for N > N e , we 
have 


S(^)(B~) < Stf$)(B) < S(x^)(B+). 
By the definition of the extreme metric, we have 


S(± 1 0 :.t)(B + )-Qt(B + ) 

S(*$$)(B-)-® t (B-) 


< ||<S(xcj;jO 

< I|5(xo;t) 



Combining (25) and (26) yields: 


- - Qt(B-)) - HSffi:*) - QtIIe < S(x^)(B) - Q t (B) 

S(x^)(B) - Q t (B) < (Q t (B+) - Q t (B)) + ||S(x^) - Qt||e- 


(25) 


(26) 


(27) 


Using the fact that Qt admits a bounded density, we have for a constant c > 0 

Qt(B) - Qr(B-) < c\ t+1 (B \ B~) < ce T+1 
Qt{B+) - Q t (B) < c\ t+ 1 (B+ \ B) < ce T+1 . 


Therefore, combining (27) and (28), we obtain, for N > N e and for all B G B r 0 i)T+i, 


-ce T+1 - ||5(xo!r) - Qt||e < S(xq^)(B) - Q T (B) < ||5(xq;t) ~ Qt||e + ce T+1 
and thus 

!||5(xo;t) -QtIIe < ||5(xo;t) -QtIIe + cc™ -1 
and the result follows from Theorem [HI 
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